Search CORE

35 research outputs found

Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment

Author: Bocklet Tobias
Martens Jean-Pierre
Middag Catherine
Nöth Elmar
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/01/2011
Field of study

Intelligibility is widely used to measure the severity of articulatory problems in pathological speech. Recently, a number of automatic intelligibility assessment tools have been developed. Most of them use automatic speech recognizers (ASR) to compare the patient's utterance with the target text. These methods are bound to one language and tend to be less accurate when speakers hesitate or make reading errors. To circumvent these problems, two different ASR-free methods were developed over the last few years, only making use of the acoustic or phonological properties of the utterance. In this paper, we demonstrate that these ASR-free techniques are also able to predict intelligibility in other languages. Moreover, they show to be complementary, resulting in even better intelligibility predictions when both methods are combined

Ghent University Academic Bibliography

Detection of persons with Parkinson's disease by acoustic, vocal, and prosodic analysis

Author: Georg Stemmer
Hana Ruzickova
Jan Rusz
Tobias Bocklet
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Abstract—70 % to 90 % of patients with Parkinson’s disease (PD) show an affected voice. Various studies revealed, that voice and prosody is one of the earliest indicators of PD. The issue of this study is to automatically detect whether the speech/voice of a person is affected by PD. We employ acoustic features, prosodic features and features derived from a two-mass model of the vocal folds on different kinds of speech tests: sustained phonations, syllable repetitions, read texts and monologues. Classification is performed in either case by SVMs. A correlation-based feature selection was performed, in order to identify the most important features for each of these systems. We report recognition results of 91 % when trying to differentiate between normal speaking persons and speakers with PD in early stages with prosodic modeling. With acoustic modeling we achieved a recognition rate of 88 % and with vocal modeling we achieved 79%. After feature selection these results could greatly be improved. But we expect those results to be too optimistic. We show that read texts and monologues are the most meaningful texts when it comes to the automatic detection of PD based on articulation, voice, and prosodic evaluations. The most important prosodic features were based on energy, pauses and F0. The masses and the compliances of spring were found to be the most important parameters of the two-mass vocal fold model. I

CiteSeerX

Crossref

Age and gender recognition for telephone applications based on GMM supervectors and support vector machines

Author: Andreas Maier
Felix Burkhardt
Josef G. Bauer
Tobias Bocklet
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

This paper compares two approaches of automatic age and gen-der classification with 7 classes. The first approach are Gaus-sian Mixture Models (GMMs) with Universal Background Models (UBMs), which is well known for the task of speaker identifica-tion/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different ker-nels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM dis-tance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74 % (p < 0.001) and are in the same range as humans. Index Terms — Acoustic signal analysis, speaker classification, age, gender, Gaussian mixture models (GMM), support vector ma-chine (SVM) 1

CiteSeerX

Crossref

Multi-class Detection of Pathological Speech with Latent Features: How does it perform on unseen data?

Author: Baumann Ilja
Bayerl Sebastian P.
Bocklet Tobias
Braun Franziska
Nöth Elmar
Riedhammer Korbinian
Wagner Dominik
Publication venue
Publication date: 27/10/2022
Field of study

The detection of pathologies from speech features is usually defined as a binary classification task with one class representing a specific pathology and the other class representing healthy speech. In this work, we train neural networks, large margin classifiers, and tree boosting machines to distinguish between four different pathologies: Parkinson's disease, laryngeal cancer, cleft lip and palate, and oral squamous cell carcinoma. We demonstrate that latent representations extracted at different layers of a pre-trained wav2vec 2.0 system can be effectively used to classify these types of pathological voices. We evaluate the robustness of our classifiers by adding room impulse responses to the test data and by applying them to unseen speech corpora. Our approach achieves unweighted average F1-Scores between 74.1% and 96.4%, depending on the model and the noise conditions used. The systems generalize and perform well on unseen data of healthy speakers sampled from a variety of different sources.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

A Stutter Seldom Comes Alone -- Cross-Corpus Stuttering Detection as a Multi-label Problem

Author: Baumann Ilja
Bayerl Sebastian P.
Bocklet Tobias
Hönig Florian
Nöth Elmar
Riedhammer Korbinian
Wagner Dominik
Publication venue
Publication date: 30/05/2023
Field of study

Most stuttering detection and classification research has viewed stuttering as a multi-class classification problem or a binary detection task for each dysfluency type; however, this does not match the nature of stuttering, in which one dysfluency seldom comes alone but rather co-occurs with others. This paper explores multi-language and cross-corpus end-to-end stuttering detection as a multi-label problem using a modified wav2vec 2.0 system with an attention-based classification head and multi-task learning. We evaluate the method using combinations of three datasets containing English and German stuttered speech, one containing speech modified by fluency shaping. The experimental results and an error analysis show that multi-label stuttering detection systems trained on cross-corpus and multi-language data achieve competitive results but performance on samples with multiple labels stays below over-all detection results.Comment: Accepted for presentation at Interspeech 2023. arXiv admin note: substantial text overlap with arXiv:2210.1598

arXiv.org e-Print Archive

Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

Author: Bayerl Sebastian P.
Bocklet Tobias
Braun Franziska
Hillemacher Thomas
Hönig Florian
Lehfeld Hartmut
Nöth Elmar
Pérez-Toro Paula A.
Riedhammer Korbinian
Publication venue
Publication date: 16/08/2023
Field of study

Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single dataset. In this work, we apply established baseline systems to discriminate cognitive impairment in speech from the semantic Verbal Fluency Test and the Boston Naming Test using text, audio and emotion embeddings in a 3-class classification problem (HC vs. MCI vs. DEM). We perform cross-corpus and mixed-corpus experiments on two independently recorded German datasets to investigate generalization to larger populations and different recording conditions. In a detailed error analysis, we look at depression as a secondary diagnosis to understand what our classifiers actually learn.Comment: Accepted at INTERSPEECH 202

arXiv.org e-Print Archive

Automatic detection of sigmatism in children

Author: Bocklet Tobias
Degenkolb-Weyers Sabine
Eysholdt Ulrich
Maier Andreas
Nöth Elmar
Valentini-Botinhao Cassia
Publication venue
Publication date: 01/09/2012
Field of study

We propose in this paper an automatic system to detect sigmatism from the speech signal. Sigmatism occurs when the tongue is positioned incorrectly during articulation of sibilant phones like /s / and /z/. For our task we extracted various sets of features from speech: Mel frequency cepstral coefficients, energies in specific bandwidths of the spectral envelope, and the so-called supervectors, which are the parameters of an adapted speaker model. We then trained several classifiers on a speech database of German adults simulating three different types of sigmatism. Recognition results were calculated at a phone, word and speaker level for both the simulated database and for a database of pathological speakers. For the simulated database, we achieved recognition rates of up to 86%, 87 % and 94 % at a phone, word and speaker level. The best classifier was then integrated as part of a Java applet that allows patients to record their own speech, either by pronouncing isolated phones, a specific word or a list of words, and provides them with a feedback whether the sibilant phones are being correctly pronounced

CiteSeerX

Edinburgh Research Explorer

A survey on perceived speaker traits: personality, likability, pathology, and the first challenge

Author: Batliner Anton
Bocklet Tobias
Burkhardt Felix
Eyben Florian
Mohammadi Gelareh
Noeth Elmar
Schuller Björn
Steidl Stefan
van Son Rob
Vinciarelli Alessandro
Weiss Benjamin
Weninger Felix
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The INTERSPEECH 2012 Speaker Trait Challenge aimed at a unified test-bed for perceived speaker traits – the first challenge of this kind: personality in the five OCEAN personality dimensions, likability of speakers, and intelligibility of pathologic speakers. In the present article, we give a brief overview of the state-of-the-art in these three fields of research and describe the three sub-challenges in terms of the challenge conditions, the baseline results provided by the organisers, and a new openSMILE feature set, which has been used for computing the baselines and which has been provided to the participants. Furthermore, we summarise the approaches and the results presented by the participants to show the various techniques that are currently applied to solve these classification tasks

Enlighten

International Migration, Integration and Social Cohesion online publications